Skip to main content

Developing an Open-Source SIEM System

·427 words·3 mins
Anomaly Detection Classification SVM Scikit Learn Auto Encoder PyTorch Numpy CI/CD ETL Kafka Python Data Visualization Grafana Prometheus OpenSearch Dashboards Kubernetes Docker ArgoCD DevOps Data Scientist Data Engineer Plotly Software Development Machine Learning Apache Spark SQL SAP Scrum Jira Confluence Git

A Security Information and Event Management (SIEM) system based on open source components is being developed for a public administration. The SIEM system offers a similar range of functions to Splunk, QRader or Arcsite. The development and deployment must meet the requirements of BigData. The application must be able to accept various log formats, normalize them and enrich them if necessary. It must then be possible to analyze the logs in an OpenSearch dashboard. In addition to the actual application, the SIEM must also be able to be monitored via applications such as Prometheus and Grafana.

graph LR subgraph Enrichment A[Logsource]-->B[Kafka]-->C[Normalization/Anrichment]-->D[OpenSearch] end subgraph Monitoring E[Prometheus] F[Grafana] end E-->B E-->C E-->D F-->E

Normalization #

Normalization is carried out using a Python ETL tool developed in-house. The ETL tool can accept log sources via an HTTP interface and write them to Kafka. If the required resources are available, the ETL tool can adjust the logs according to a rule tree and enrich them with additional information such as GPS data. The aim of normalization is to bring the logs into the Elastic Common Schemas (ECS) format. In addition to enrichment and normalization, the tool also offers functionalities for the early detection of malicious activities. On the one hand, classic, rule-based comparisons are used to detect such activities. On the other hand, an SVM classifier is also used to identify log data that attempts to circumvent corresponding rules.

Anomaly detection #

OpenSearch’s integrated anomaly detection is used to identify deviating log behavior at an early stage and draw analysts’ attention to it. A random cut forest model is used for this purpose, which has been trained on our own data.

In addition to the onboard functionalities of OpenSearch, autoencoders were also evaluated in order to detect anomalies. This is realized with a PyTorch Lightning implementation and synthetic log data.

AIOps #

In addition to classic application monitoring with Prometheus and Grafana, monitoring was also supplemented with prediction models. That way Prometheus provides additional metrics that represent expected behavior. By comparing them to the actual data, potential problems can be identified at an early stage if certain threshold values are exceeded.

Activities #

  • Development of an ETL software for normalization and preparation of log data in Python
  • Development of a system for anomaly detection
  • Development of a SVM classifier of suspicious log data
  • Support in the planning and development of SIEM components
  • Setup and configuration of the components
  • CI/CD configurations** in Gitlab and Github
  • Data visualizations of events and metrics in Grafana and OpenSearch Dashboards
  • Deployment in Kubernetes clusters with ArgoCD and DevOps practices